Densification of Genetic Map and Stable Quantitative Trait Locus Analysis for Amino Acid Content of Seed in Soybean (Glycine max L.)

Soybean, a primary vegetable protein source, boasts favorable amino acid profiles; however, its composition still falls short of meeting human nutritional demands. The soybean amino acid content is a quantitative trait controlled by multiple genes. In this study, an F2 population of 186 individual plants derived from the cross between ChangJiangChun2 and JiYu166 served as the mapping population. Based on the previously published genetic map of our lab, we increased the density of the genetic map and constructed a new genetic map containing 518 SSR (simple sequence repeats) markers and 64 InDel (insertion-deletion) markers, with an average distance of 5.27 cm and a total length of 2881.2 cm. The content of eight essential amino acids was evaluated in the F2:5, F2:6, and BLUP (best linear unbiased prediction). A total of 52 QTLs (quantitative trait loci) were identified, and 13 QTL clusters were identified, among which loci02.1 and loci11.1 emerged as stable QTL clusters, exploring candidate genes within these regions. Through GO enrichment and gene annotation, 16 candidate genes associated with soybean essential amino acid content were predicted. This study would lay the foundation for elucidating the regulatory mechanisms of essential amino acid content and contribute to germplasm innovation in soybeans.


Introduction
Soybean (Glycine max L. Merr.) is one of the most important seed crops grown worldwide.Domestication and improvement have shaped soybean as the most important dualfunction crop to prove highly valuable seed protein and oil, which together account for its high economic value [1,2].Soybean is rich in protein, making it an essential plant-based protein source in human diets [3].The bioavailability of soy protein in humans is equivalent to that of proteins derived from milk and eggs [4,5].According to data published by the Food and Agriculture Organization (FAO), global soybean production in 2022 amounted to 348.8 million tons, of which only about 6% was used for direct human consumption, while approximately 75% was utilized as animal feed [6], thereby constituting a primary source of protein in animal feed formulations [7].
Soybean exhibits a relatively high protein content and it has an excellent amino acid composition [8].However, the amino acid composition of soybean still presents certain deficiencies; for example, it is deficient in sulfur-containing amino acids [9,10], which is insufficient to meet the amino acid requirements of animals and humans [11,12].It is a significant limiting factor for improving the seed quality of the lack of sulfur-containing amino acids in soybean [13].Therefore, enhancing the content of certain amino acids can improve the nutritional value of soybeans [14].Constructing a high-density genetic map and identifying relevant genes affecting QTLs of essential amino acid content can contribute to enhancing the nutritional value of soybean seed.
The content of essential amino acids is a complex quantitative trait governed by multiple genes and highly influenced by environmental factors, which poses challenges for plant breeders in selecting this trait [9,15,16].QTL analysis offers a robust tool for soybean breeders, facilitating the discovery of novel sources of variation and the investigation of the genetic determinants underlying quantitatively inherited traits.According to the latest database, 112 QTLs related to essential amino acids have been localized in the SoyBase (http://www.soybase.org,accessed on 7 March 2024).Panthee et al. developed recombinant inbred lines using the N87-984-16, and TN93-99 crosses to identify genomic regions controlling amino acid content [17].B. Fallen et al. constructed recombinant inbred lines using Essex and Williams82, identifying 10 QTLs related to amino acid content [18].Wang et al. developed two populations and detected 8 QTLs associated with the content of methionine (Met) and lysine (Lys) [19].Ma et al. used recombinant inbred lines from the Kefeng 1 and Nannong 1138-2.crosses to identify 9 QTLs associated with cysteine (Cys) and methionine (Met) [20].
This study aims to construct a high-density genetic map and identify QTLs related to essential amino acids in soybean seed.Utilizing a population derived from a cross between ChangJiangChun2 (CJC2) and JiYu166 (JY166) across three different environments, the findings are anticipated to aid marker-assisted selection (MAS) and improve our understanding of the genetic basis of essential amino acid composition in soybeans.

Trait Phenotype Analysis
As shown in Table 1, it seems that the phenotype date of two parents appears to be different in eight traits in three environments.Among the eight traits, except for Trp and Phe, the levels of the other six amino acids in JY166 are relatively higher compared to those in CJC2.According to the phenotype data just mentioned, we can see apparent segregation within the population; the concrete information is as follows: coefficients of variation of different traits range from 0.44~22.2%,which shows transgressive segregation for each trait.The histogram of frequency distribution (Figure 1) showed a basically normal distribution of eight traits in the three environments, which was consistent with the genetic rule of quantitative traits.Correlation analysis (Figure 2) showed that there was a certain correlation between eight essential amino acids.Trp demonstrated highly significant negative correlations with four amino acids and a notably strong negative correlation with Met.Lys had a positive correlation with Thr and Ile.The correlation coefficient between Phe and Leu was the largest, reaching 0.970, indicating that suitable varieties could be selected according to these laws in breeding.Correlation analysis (Figure 2) showed that there was a certain correlation between eight essential amino acids.Trp demonstrated highly significant negative correlations with four amino acids and a notably strong negative correlation with Met.Lys had a positive correlation with Thr and Ile.The correlation coefficient between Phe and Leu was the largest, reaching 0.970, indicating that suitable varieties could be selected according to these laws in breeding.

Genetic Map Construction
Based on the resequencing data of CJC2 and JY166, 162 pairs of InDel primers were selected.After genotyping the parental lines, 64 pairs of primers were used in constructing the genetic map.Combined with the previous genetic map constructed in our laboratory [21], a new linkage map was constructed, which contained 582 maker loci distributed across the 20 chromosomes of soybeans.The genetic map was 2881.2 cM in length, with an average map distance of 5.27 cM (Figure 3).The longest linkage map was 217.4 cM of chromosome 18; the shortest was 60.2 cM of chromosome 16.The maximum number of markers was 63 on chromosome 19, and the minimum number of markers was 10 on chromosome 16.The longest average distance between markers was 9.27 cM on chromosome 8, and the shortest average distance between markers was 2.43 on chromosome

Genetic Map Construction
Based on the resequencing data of CJC2 and JY166, 162 pairs of InDel primers were selected.After genotyping the parental lines, 64 pairs of primers were used in constructing the genetic map.Combined with the previous genetic map constructed in our laboratory [21], a new linkage map was constructed, which contained 582 maker loci distributed across the 20 chromosomes of soybeans.The genetic map was 2881.2 cm in length, with an average map distance of 5.27 cm (Figure 3).The longest linkage map was 217.4 cm of chromosome 18; the shortest was 60.2 cm of chromosome 16.The maximum number of markers was 63 on chromosome 19, and the minimum number of markers was 10 on chromosome 16.The longest average distance between markers was 9.27 cm on chromosome 8, and the shortest average distance between markers was 2.43 on chromosome 19.

QTLs Identified for Essential Amino Acids
Using the multiple QTL model (MQM) mapping methods and based on the constructed linkage groups, a total of 52 stable QTLs associated with essential amino acids were mapped in more than two environments.(Figure 4 and Table 2).
A total of 10 QTLs for valine were identified and mapped on seven chromosomes, explaining the phenotypic variation from 8.40% to 20.30%.The favorable alleles of three QTLs were derived from CJC2, and the favorable alleles of the other three QTLs were derived from JY166.
A total of 5 QTLs for threonine were identified and mapped on five chromosomes, explaining the phenotypic variation from 9.70% to 13.0%.Most of the QTL's favorable alleles were derived from CJC2, except for qThr18.1.
A total of 13 QTLs for phenylalanine were identified and mapped on ten chromosomes, explaining the phenotypic variation from 8.10% to 23.80%.The favorable alleles of ten QTLs were derived from CJC2, and the favorable alleles of the other three QTLs were derived from JY166.
A total of 4 QTLs for methionine were identified and mapped on four chromosomes, explaining the phenotypic variation from 8.20% to 13.60%.All favorable alleles were derived from JY166.
A total of 6 QTLs for Lysine were identified and mapped on six chromosomes, explaining the phenotypic variation from 7.40% to 17.20%.The favorable alleles of five QTLs were derived from CJC2, and the favorable alleles of one other QTL were derived from JY166.
A total of 7 QTLs for Leucine were identified and mapped on six chromosomes, explaining the phenotypic variation from 7.40% to 17.20%.The favorable alleles of the QTLs were derived from CJC2, except for qLeu11.1.
Plants 2024, 13, 2020 5 of 16 [21], a new linkage map was constructed, which contained 582 maker loci distributed across the 20 chromosomes of soybeans.The genetic map was 2881.2 cM in length, with an average map distance of 5.27 cM (Figure 3).The longest linkage map was 217.4 cM of chromosome 18; the shortest was 60.2 cM of chromosome 16.The maximum number of markers was 63 on chromosome 19, and the minimum number of markers was 10 on chromosome 16.The longest average distance between markers was 9.27 cM on chromosome 8, and the shortest average distance between markers was 2.43 on chromosome 19.

QTLs Identified for Essential Amino Acids
Using the multiple QTL model (MQM) mapping methods and based on the constructed linkage groups, a total of 52 stable QTLs associated with essential amino acids were mapped in more than two environments.(Figure 4 and Table 2).
A total of 10 QTLs for valine were identified and mapped on seven chromosomes, explaining the phenotypic variation from 8.40% to 20.30%.The favorable alleles of three QTLs were derived from CJC2, and the favorable alleles of the other three QTLs were derived from JY166.
A total of 5 QTLs for threonine were identified and mapped on five chromosomes, explaining the phenotypic variation from 9.70% to 13.0%.Most of the QTL's favorable alleles were derived from CJC2, except for qThr18.1.
A total of 13 QTLs for phenylalanine were identified and mapped on ten chromosomes, explaining the phenotypic variation from 8.10% to 23.80%.The favorable alleles of ten QTLs were derived from CJC2, and the favorable alleles of the other three QTLs were derived from JY166.
A total of 4 QTLs for methionine were identified and mapped on four chromosomes, explaining the phenotypic variation from 8.20% to 13.60%.All favorable alleles were derived from JY166.
A total of 6 QTLs for Lysine were identified and mapped on six chromosomes, explaining the phenotypic variation from 7.40% to 17    were derived from CJC2, and the favorable alleles of one other QTL were derived from JY166.
A total of 7 QTLs for Leucine were identified and mapped on six chromosomes, explaining the phenotypic variation from 7.40% to 17.20%.The favorable alleles of the QTLs were derived from CJC2, except for qLeu11.1.
A total of 5 QTLs for Isoleucine were identified and mapped on 4 chromosomes, explaining the phenotypic variation from 8.90% to 14.60%.All favorable alleles of the QTLs were derived from CJC2.
A total of 2 QTLs for Tryptophan were identified and mapped on 2 chromosomes, explaining the phenotypic variation of 7.40% and 17.20%.The favorable alleles of the QTLs were derived from JY166.A total of 5 QTLs for Isoleucine were identified and mapped on 4 chromosomes, explaining the phenotypic variation from 8.90% to 14.60%.All favorable alleles of the QTLs were derived from CJC2.
A total of 2 QTLs for Tryptophan were identified and mapped on 2 chromosomes, explaining the phenotypic variation of 7.40% and 17.20%.The favorable alleles of the QTLs were derived from JY166.

Identification and Analysis of QTL Clusters
Following the principle of stability and effectiveness, a total of 13 QTL clusters were located on 10 chromosomes in this study (Table 3).In terms of controlling quantitative traits, Loci02.1 contains the highest number of a total of 7 QTLs.One QTL cluster of five traits was Loci12.1, and two QTL clusters of four traits were Loci12.2 and Loci17.1, while the one QTL cluster of three traits was Loci19.1.The remaining QTL clusters were all for two traits.In terms of the number of controlled traits and stability of detected QTL, two important clusters were Loci02.1 and Loci11.1.respectively, failed to be found in any GO Ontologies.By integrating gene functional annotation, a total of 16 candidate genes potentially involved in regulating the essential amino acid content were identified (Table 4).

Candidate Gene Prediction
In the respective promising intervals of the two important clusters, the Loci02.1 searched 236 genes in the physical location ranging from 43.68 MB to 45.65 MB of chromosome 2, and the Loci11.1 searched 203 genes in location ranging from 10.12 MB to 12.78 MB of chromosome 11.Based on the GO enrichment tools of the SoyBase (http://www.soybase.org,accessedon 9 March 2024) and the Wm82 genome assemblies, all the genes were conducted with GO analysis (Figure 5).Among the genes of Loci02.1 and Loci11.1,34 genes and 31 genes, respectively, failed to be found in any GO Ontologies.By integrating gene functional annotation, a total of 16 candidate genes potentially involved in regulating the essential amino acid content were identified (Table 4).

Discussion
Soybean is a crucial crop globally, boasting high protein content and an excellent amino acid composition.However, challenges persist in the seed essential amino acid content of soybeans, as it is deficient in sulfur-containing amino acids.Previous research indicates that simply increasing soybean crude protein content may not elevate essential amino acid concentration [22].Hence, it is imperative to investigate QTL associated with soybean essential amino acid content.
The limit of map-based cloning in soybeans includes the insufficiency of molecular markers [23].In this study, QTL mapping was performed on the F 2 population obtained by hybridization of CJC2 and JY166, and a genetic map containing 564 SSR markers and 64 InDel markers was constructed, with an average map distance of 5.27 cm (Figure 3).Compared with previous studies on QTL mapping of essential amino acid content in soybeans, the map has a higher marker density, which improves the QTL resolution and helps to fine-locate candidate genes [17,22,24].
In this study, the content of eight essential amino acids showed extensive and continuous variation, and there was clear transgressive segregation (Figure 1), indicating that these traits are complex quantitative traits controlled by multiple genes, which is consistent with previous results [18,22].A total of 52 stable QTLs were detected by MQM, and it was observed that the most favorable alleles came from CJC2.Of these QTLs, qIle01.1 was consistent with Panthee et al., and qPhe19.1 had a high LOD value (8.97) and phenotypic variance (23.8%); this region may contain candidate genes controlling Phe (Table 2).
We detected overlapping QTLs for multiple traits and 13 QTL clusters located on chromosomes 1, 2, 3, 6, 7, 11, 12, 17, 18, and 19 (Table 3).Each QTL cluster was associated with two or more traits related to seed essential amino acids.In terms of the number of controlled traits and QTL environmental stability, Loci02.1 and Loci11.1 might be selected for further research.QTL clusters may represent gene/QTL linkage or pleiotropic effects of a single QTL within the same genomic region [25].The correlation analysis shows that the correlation between Phe and Leu is the largest, reaching 0.970, and 6 QTL clusters associated with Phe and Leu, which deserves further consideration.Fallen et al. also reported a positive correlation between these two amino acids [18].The physical locations of Loci02.1 and Loci11.1 range from 43.68 MB to 45.65 MB and from 10.12 MB to 12.78 MB, respectively, and 439 genes were obtained within the intervals.Eventually, after gene function annotation screening, 16 candidate genes for seed essential amino acid of soybean are obtained.
Among these candidate genes, we identified several with homologs in Arabidopsis thaliana (AT), some of which are associated with our target traits.Here, I present them to provide a reference for further investigation (Table 4).Glyma.02g254300encodes a protein containing Leu-rich repeats and a degenerate F-box motif and is related to the genetic pathway that modulates petal senescence by jasmonic acid (JA) [26].Glyma.02g260900 was found to be related to the synthesis of Tyrosine and typically strongly feedback inhibited by Tyr [27].Glyma.02g263900 was related to the male and female gametes in the sexual reproduction [28].Glyma.02g270000 could be related to the function of regulating the hexameric structure and ATPase activity of AtCDC48 [29].Glyma.02g270700encoded the leucine-rich repeat-malectin receptor kinases for Arabidopsis immune responses triggered by β-1,4-D-Xylo-oligosaccharides from plant cell walls [30].Glyma.02g271700 was important in chromatin regulation and in maintaining transcriptional gene silencing (TGS) in some genomic regions of AT [31].Glyma.11g144900 could be involved in many development processes, including the regulation of premature cell death [32].
In light of the relevant literature in the field, the 16 genes identified are deemed crucial candidate regulators of soybean essential amino acid content, as determined by QTL localization and gene function annotation.Nonetheless, their precise functional mechanisms necessitate further scrutiny.

Plant Materials
Changjiang Chun 2 (CJC2) is a high-protein variety from Chongqing, while JY166 is a widely cultivated high-oil variety in northern regions.There is a significant genetic difference between parental lines, and they have a distant genetic relationship.In this study, 186 individual plants of the F 2 population produced from the hybridization of CJC2 and JY166 were used as the location population.
The F 2:5 population was planted during the summer of 2023 at the Teaching and Experimental base of Southwest University in Chongqing (23CQ), while the F 2:6 population was planted in the winter of 2021 in Yuanjiang, Yunnan (23YN).Both populations were sown in single rows, with a row length of 1 m, row width of 0.5 m, and plot spacing of 0.2 m, with two seeds planted per plot.Following standard field management practices, all plants were harvested at maturity, and subsequent testing was conducted to determine the essential amino acid content of the seed.

DNA Extraction and Genotyping
Genomic DNA was extracted from 189 plants, including the F 2 population, the two parents, and F 1 plant.A total of 2933 SSR primer pairs were synthesized by Biotech Bioengineering Co., Ltd.(Shanghai, China), derived from the soybean database SoyBase (http://www.soybase.org,accessed on 9 March 2024) [33].Some of these BARCSOYSSR primers were renamed as SWU in this study.And a total of 162 insertion-deletion (InDel) primer pairs were synthesized based on comparing parental 10 × resequencing data with the soybean Wm82.a4.v1 reference genome, selecting InDel sites > 15 bp as markers.The primer sequence of SSR and InDel were listed in Table S1.The PCR amplification, following the protocol described by Zhang et al. [34], utilized primers containing polymorphisms between the two mapping parents for genotyping the individual plants of the F 2 population.The band types identical to CJC2 and JY166 were recorded as A and B, respectively, while the heterozygous band type was labeled as H, and the deletion was denoted as U.The resulting data were then collected for further analysis, revealing the definition of additive effects for the CJC2 allele, signifying that positive genetic effects were present, indicating that CJC2 alleles contribute to increased phenotypic values.

Determination of Traits
FOSS NIRS DS2500 (Foss Analyical A/S, Hilleroed, Denmark) was used to determine 8 essential amino acid content of seeds, from 400 to 2500, in transmittance mode with a 1 mm pathlength.A reference scan was taken once in every 10 sample scans.To increase the signal-to-noise ratio, both reference and sample spectra were averaged from 32 scans.Samples were temperature equilibrated at 33 • C (approximately 3 min) in the instrument before scanning and for the rest.
The phenotype data were subjected to statistical analysis using Excel 2019, while the data were processed and plotted using Origin 2019.The best linear unbiased prediction (BLUP) values were compute based on the amino acid content of F 2:5 and F 2:6 using R programming language [35].

Map Construction and QTL Detection
Conduct marker linkage analysis using JoinMap4.0 to establish a genetic linkage map with an LOD score of 3.0 and employ the Kosambi mapping function for mapping unit conversion [36,37].QTL localization for all traits was analyzed with the MQM and MapQTL6.0 software, and phenotypic data were analyzed using 1000 permutation tests with significance p = 0.05 and LOD = 3.0 as the threshold to determine the presence of QTLs.The QTL graphic representation of the linkage groups was created using MapChart2.2[38].The qualified interval was then named QTL.The QTLs were named with the letter "q", the trait name, the chromosome number, and the sort number.For example, the first QTL we found at Chromosome 1 related to Val would be called qVal01.1.

QTL Cluster Identification
A QTL cluster is a densely populated QTL region of the chromosome which contains multiple QTLs associated with various traits [39].All QTLs will be sorted based on their physical positions on their respective chromosomes.If there are two or more QTLs at the same physical position, they will be grouped into a QTL cluster.The QTL clusters that we found were labeled with "Loci".For example, for the QTL cluster denoted as Loci01.1,Loci indicates a QTL cluster, 01 indicates the chromosome on which the QTL cluster was detected, and 01.1 indicates the order of the QTL cluster identified on the chromosome.

Candidate Gene Prediction
The candidate genes were searched with SoyBase (http://www.soybase.org,accessed on 10 March 2024) on the candidate interval of promising QTL clusters.Enriched the terms GO (Gene Ontology) and analyzed the families and subfamilies, molecular functions, biological processes, and pathways of genes in the identified QTLs.Finally, candidate genes related to essential amino acids were screened.

Conclusions
In this study, the genetic map previously constructed in the laboratory was densified by adding 64 InDel markers.Using the MQM method, QTLs associated with essential amino acids in soybeans were mapped, identifying 52 stable QTLs.By integrating Gene Ontology (GO) enrichment analysis and gene function annotation, 16 genes related to the target traits were ultimately identified.The candidate genes delineated in this study furnish pivotal theoretical underpinnings and genetic reservoirs for the subsequent enhancement of soybean essential amino acid content.

Figure 2 .
Figure 2. Correlation analysis of eight essential amino acid contents (** = p < 0.001, * = p < 0.05.The data in this table are the average results of three environments).

Figure 2 .
Figure 2. Correlation analysis of eight essential amino acid contents (** = p < 0.001, * = p < 0.05.The data in this table are the average results of three environments).

Table 1 .
Characteristics of seed essential amino acid contents in three environments.

Table 1 .
Cont. and 23YN indicate the summer of 2023 in Chongqing and the winter of 2023 in Yunnan, respectively.Best linear unbiased prediction (BLUP) obtained by calculating the essential amino acid content of F 2:5 and F 2:6 .indicate the summer of 2023 in Chongqing and the winter of 2023 in Yunnan, respectively.Best linear unbiased prediction (BLUP) obtained by calculating the essential amino acid content of F2:5 and F2:6. 23CQ 19.

Table 2 .
QTLs identified for essential amino acids in three environments.
a 23CQ and 23YN indicate 2023 in Chongqing and 2023 winter in Yunnan, respectively.b PVE: phenotypic variance explained.

Table 2 .
QTLs identified for essential amino acids in three environments.

Table 3 .
QTL clusters associated with essential amino acids in soybean.

Table 4 .
Candidate genes for essential amino acid of soybean.

Table 4 .
Candidate genes for essential amino acid of soybean.